ÄÚµå¹øÈ£ : |
1 |
|
¹ßÇ¥ÀÚ : |
±è°ÇÈñ |
|
¼Ò¼Ó : |
¼¿ï´ëÇб³ |
|
ºÎ¼ : |
ÄÄÇ»ÅÍ°øÇкΠ|
|
Á÷À§ : |
±³¼ö |
|
¼¼¼Ç½Ã°£ : |
16:00~18:00 |
|
¹ßÇ¥ÀÚ¾à·Â : |
2015-ÇöÀç: ¼¿ï´ëÇб³ ÄÄÇ»ÅÍ°øÇкΠÁ¶±³¼ö
2013-2015: Disney Research ¹Ú»çÈÄ ¿¬±¸¿ø
2013. Carnegie Mellon University, Computer Science Department ¹Ú»ç
2008. Carnegie Mellon University, The Robotics Institute ¼®»ç
2006. Çѱ¹°úÇбâ¼ú¿¬±¸¿ø (KIST) ¿¬±¸¿ø
2001. Çѱ¹°úÇбâ¼ú¿ø(KAIST) ±â°è°øÇаú Çлç/¼®»ç |
|
°¿¬¿ä¾à : |
In this talk, I will introduce Poseidon, a scalable system architecture for distributed inter-machine communication in existing deep learning frameworks. Poseidon features three key contributions: (1) a three-level hybrid architecture that allows Poseidon to support both CPU-only and GPU-equipped clusters, (2) a distributed wait-free backpropagation (DWBP) algorithm to improve GPU utilization and to balance communication, and (3) a structure-aware communication protocol (SACP) to minimize communication overheads. I also present experiment results that Poseidon converges to same objectives as a single machine, and achieves state-of-the-art training speedup across multiple models and well-established datasets, using a commodity GPU cluster of 8 nodes (4.5x on AlexNet, 4x on GoogLeNet). On the much larger ImageNet 22K dataset, Poseidon with 8 nodes achieves better speedup and competitive accuracy to recent CPU-based distributed deep learning systems such as Adam and Le et al, which use 10s to 1000s of nodes. Poseidon is active open-source framework, and the current release is available at https://github.com/petuum/poseidon. |
|
|